Metagenomic investigation of the equine faecal microbiome reveals extensive taxonomic diversity

Background The horse plays crucial roles across the globe, including in horseracing, as a working and companion animal and as a food animal. The horse hindgut microbiome makes a key contribution in turning a high fibre diet into body mass and horsepower. However, despite its importance, the horse hindgut microbiome remains largely undefined. Here, we applied culture-independent shotgun metagenomics to thoroughbred equine faecal samples to deliver novel insights into this complex microbial community. Results We performed metagenomic sequencing on five equine faecal samples to construct 123 high- or medium-quality metagenome-assembled genomes from Bacteria and Archaea. In addition, we recovered nearly 200 bacteriophage genomes. We document surprising taxonomic diversity, encompassing dozens of novel or unnamed bacterial genera and species, to which we have assigned new Candidatus names. Many of these genera are conserved across a range of mammalian gut microbiomes. Conclusions Our metagenomic analyses provide new insights into the bacterial, archaeal and bacteriophage components of the horse gut microbiome. The resulting datasets provide a key resource for future high-resolution taxonomic and functional studies on the equine gut microbiome.


INTRODUCTION
The horse has played a crucial role in human development and in the extension of human settlement (Roberts, 2017). Domestication of the horse began at least 6,000 years ago and led to diversification into numerous breeds, accompanied by significant biological changes (Fages et al., 2019). The horse remains an important component of human society, with around 60 million horses worldwide (Clarkson, 2017). Horses provide health benefits through horse-riding and equine-assisted therapy alongside playing roles as working animals across the globe, in transport, agriculture or policing. The horse remains an important food animal globally, with five million animals slaughtered for food each year and horsemeat now in favor as a low-methane red-meat alternative to beef (Belaunzaran et al., 2015). In the UK, there are around 374,000 horse-owning households and horseracing is the second most attended sport in the country after football, contributing £4.7 billion to the UK economy (British Equine Trade Association, 2019).
As a foraging herbivore, the horse relies on a cellulose-rich diet of grass and legumes. However, unlike cattle, horses have no rumen to digest complex carbohydrates. Instead, they rely on hindgut fermentation: an efficient but enigmatic process-far less well understood than ruminal digestion-that relies on a rich microbial community, the hindgut microbiome, encompassing bacteria, archaea and viruses, together with fungi and other eukaryotic microbes (Costa & Weese, 2018;Julliand & Grimm, 2016;Santos et al., 2011). This ecosystem plays a key role in nutrient assimilation and feed conversioneffectively turning grass into horseflesh and horsepower. The horse gut also acts as a reservoir of equine and several human pathogens, as well as sources of antimicrobial resistance (Maddox et al., 2015).
Crucially, various diseases are associated with disturbances in hindgut microbial ecology, including foal diarrhoea, colitis, laminitis, colic and equine grass sickness (Leng et al., 2018). Thus, by better understanding the equine hindgut microbiome, we stand to inform interventions that can improve the health and welfare, performance, value and longevity of horses.
Previous studies of the horse hindgut microbiome have documented a rich variety of microorganisms (spanning phyla from all three domains of life) and have shown that the taxonomic composition of this community varies with age, breed and disease status and has changed during domestication (Costa & Weese, 2018;Julliand & Grimm, 2016;Leng et al., 2018;Massacci et al., 2020;O'Donnell et al., 2013;Proudman et al., 2015;Stewart et al., 2018;Metcalf et al., 2017;Leng et al., 2019). However, earlier studies have largely relied on short-read meta-barcoding analyses of 16S rRNA gene sequences, which are limited in that they fail to provide resolution down to the species or strain level, provide limited insight into population structures or functional repertoires of microbial species and fail to cover viruses and eukaryotes. Thus, despite previous effortsand drawing on comparisons with the human microbiome, where new species are still being discovered Forster et al., 2019)-the horse hindgut microbiome presents us with a vast, only superficially explored (Di Pietro et al., 2021) landscape of taxonomic, ecological and functional diversity, certain to encompass important, yet undiscovered roles. Babenko et al. (2020) emphasize this with their preliminary exploration of the equine faecal virome, presenting a rich taxonomically diverse viral community which is thought to be essential in shaping microbial ecology. As in studies of the human gut microbiome, faeces provides ready non-invasive access to the gut contents. Application of short-read metagenomics to complex environmental microbial communities has proven capable of recovering large-scale catalogues of near-complete genomes, vastly expanding the tree of life to include multiple phyla with no known cultured representative (Parks et al., 2017). Drawing on these principles, as a component within the Alborada Well Foal study-a cohort study of equine gut microbial development and health-we applied shotgun metagenomics to five equine faecal samples from 12month-old thoroughbreds to expand our knowledge of this microbial landscape.

Sample collection and storage
Faecal samples were from five, 12-month-old Thoroughbred racehorses from the same farm and field in Ireland. All samples were collected in April 2019 from horses raised on permanent pasture of mixed ryegrass. Horses were not being exercised at the time of sample collection. Feed supplementation whilst at pasture was proprietary post weaning cereal and trace element pellets plus an additional trace mineral and amino acid supplement. All horses had received ivermectin and praziquantel paste four weeks prior to sampling. Samples were collected as part of the Alborada Well Foal study, under the University of Surrey's ethical review framework, project code: NERA-2017-007-SVM. 100 g of freshly evacuated faeces was collected from each horse in sterile tubes before immediate storage at 4 C on site at the stud. All samples were shipped the same day at ambient temperature and received within 24 h. Upon receipt, samples were refrigerated before being aliquoted and stored at −80 C until DNA extraction. Samples were thawed and homogenized before DNA extraction using the DNeasy PowerSoil kit (Qiagen), following manufacturer's instructions. Extracted DNA was stored at −20 C before further analysis.
Taxonomic profiling of sequencing reads was performed using Kraken 2 (Wood, Lu & Langmead, 2019) to search a microbial database built from archaeal, bacterial, fungal, protozoan, viral and univec_core sequences in Refseq in January 2021. Bracken was used to estimate taxon abundance from Kraken 2 profiles, accepting only those taxa with >1,000 assigned reads (Lu et al., 2017). Bracken-database files were generated using "brackenbuild" on our microbial database and visualised using Pavian (Breitwieser & Salzberg, 2016).
DAS Tool was applied to the output from all three bin predictors, generating a catalogue of 196 bins from five samples (Sieber et al., 2018). All bins were profiled against the BAM file for their source metagenomic sample using the anvi'o 'anvi-profile' workflow (Eren et al., 2015). Using the 'anvi-interactive' tool, each bin was refined manually according to GC content, single copy core gene (SCG) taxonomy and coverage as well as detection statistics. CheckM v1.0.11 (Parks, Imelfort & Skennerton, 2015) was used for quality assessment of all bins using the lineage_wf function. Bins showing >50% completion and <10% contamination were assessed for quality score (defined as estimated genome completeness score minus five times estimated contamination score), a commonly used standard for defining acceptable bin quality (Parks et al., 2017). Bins with <70% completion and/or a quality score of <50 were categorised as low-quality metagenomeassembled genomes (MAGs) (n = 29); those with >70% completion, <10% contamination and quality score >50 were categorised as medium-quality MAGs (n = 68) and those with >90% completion, <5% contamination and quality score >50 were classified as high-quality MAGs (n = 55).

Taxonomic and phylogenetic profiling of MAGs
Medium-and high-quality MAGs from all five samples were de-replicated at 95% average nucleotide identity (ANI) with a default aligned fraction of >10% using dRep v2.0 (Olm et al., 2017), to create a non-redundant species catalogue. Clustering at 99% ANI was used to identify a non-redundant strain catalogue and select a representative MAG per strain. CompareM v0.1.1 (Oksanen et al., 2019) was used to assign Average Amino-acid Identity (AAI) values followed by AAI clustering at 60% to allow delineation at the genus level.

Abundance and metabolic profiling of MAGs
To estimate the proportion of reads within each BioSample represented by our final, de-replicated MAG catalogue, contigs from the non-redundant MAG catalogue were concatenated and filtered reads aligned back to this MAG database using Bowtie 2 (Langmead & Salzberg, 2012). Ordered BAM files were assessed using anvi'o (Parks, Imelfort & Skennerton, 2015) to calculate coverage statistics per-contig, allowing the calculation of mean coverage across each assembled genome according to methods available at: https://merenlab.org/data/2017_Delmont_et_al_HBDs/ and described by Delmont et al. (2018). Species accumulation and distribution analyses were conducted using the Vegan package in R (Oksanen et al., 2019) before visualisation using ggplot2 (Wickham, 2016).
Bacteriophage contigs from the catalogue were used as queries in a BLASTn search against the NCBI non-redundant nucleotide database (conducted on 21/12/2020) using an e-value of ≤1e−5. Only matches with a query cover >50% and percentage ID >70% were selected as being significant. Initial taxonomic classification of phage genomes at order and family level was performed using https://github.com/feargalr/Demovir against a viral subset of non-redundant TrEMBL database with an e-value of ≤1e−5. For each viral contig, individual coding sequences were predicted using Prodigal (Hyatt et al., 2010), before concatenation for input into vCONTACT2 v0.9.19 (Bin Jang et al., 2019) for construction of a gene-sharing network incorporating a de-replicated RefSeq database of reference prokaryotic virus genomes. The resulting network was visualised using Cystoscape v3.8.0 (Shannon et al., 2003).

Reference-based profiling documents microbial diversity
Whole genome sequencing of five faecal samples derived from 12-month-old Thoroughbred horses, each yielded >6 ng/µl DNA and collectively generated >280 million paired reads or >84 Gbp of sequence data. Reads derived from the horse genome accounted for <1% of reads from each sample (Table S1). We initially analysed reads using the k-mer-based program Kraken 2, followed by refined phylogenetic analysis via the allied program Bracken. Such analyses revealed extensive novelty and diversity in the equine faecal microbiome, with >59% of sequence reads in each sample classified by Kraken as "unassigned", i.e. from unknown organisms. Assignable reads represented all three domains of life, as well as viruses, although bacteria predominated, accounting for >89% of assigned reads in any sample (Table S2).
Bacterial reads were predominantly assigned to the four phyla in the NCBI taxonomy most associated with animal gut microbiomes-Proteobacteria, Firmicutes, Bacteroidetes and Actinobacteria. However, the Kraken 2 profiles also provided evidence of over thirty additional bacterial phyla in this ecosystem. Many of these appear to be novel in the context of the horse gut, including Deinococcus-Thermus, Thermotogae and the Candidatus phylum Cloacimonetes (also called WWE1), which has been reported almost exclusively from anaerobic fermenters and the aqueous environment (Calusinska et al., 2018;Limam et al., 2014). However, as this phylum has recently been detected in soil fertilised with manure from dairy cattle, chickens and swine and has been implicated in anaerobic digestion of cellulose, it may play important similar roles in the vertebrate gut (Limam et al., 2014;Laconi et al., 2021). Reads assigned to eukaryotes provided evidence of budding yeasts and apicoplexan parasites in these samples.
Remarkably, two samples showed a very high relative abundance of reads assigned to the genus Acinetobacter (44% and 66% of classified reads), mirroring similar findings on two healthy horses in a previous study using 16S rRNA gene sequences (Costa et al., 2012). Bracken assigns these reads to an implausible sixty-two species of Acinetobacter, which is more likely to represent misassignment of reads rather than genuine diversity within this genus in this context.

Over a hundred newly named bacterial species
We generated almost 200 non-redundant bins from single-sample assemblies using three different approaches to binning. 123 bins represent medium-or high-quality metagenome-assembled genomes (MAGs), 96 with ≥15 amino acid tRNAs (Tables S3  and S4). Genome sizes ranged from~0.5 to 3.8 Mbp, while GC content ranged from 31% to 60%. De-replication at 95% ANI clustered MAGs into 110 species clusters, spanning ten phyla (Fig. 1A). An average of 18% of the initial, host-depleted metagenomic reads per sample were represented within the final, dereplicated MAG catalogue. According to GTDB, around half (48%) of the MAG species clusters belonged to the Bacteroidota, while just over a third (35%) belonged to the Firmicutes (split by GTDB into Firmicutes, Firmicutes_A and Firmicutes_C). Only fourteen of the bacterial species from the horse gut had been previously defined and delineated: nine with validly published Latin binomials and five simply with alphanumerical designations assigned by GTDB (these are placeholder names assigned when no well-formed Latin name exists for the species) ( Table S5).
Two of the species with validly published names, Ligilactobacillus hayakitensis (synonym Lactobacillus hayakitensis) (Morita et al., 2007) and Limosilactobacillus equigenerosi (synonym Lactobacillus equigenerosi) (Endo et al., 2008), have been previously cultured from the faeces of thoroughbred racehorses and are thought to be positively associated with equine intestinal health (Morita et al., 2009). Similarly, the species Streptococcus equinus was named in the early twentieth century after its association with horse dung and has been repeatedly isolated from this source (Andrewes & Horder, 1906;Smith & Shattock, 1962). Another of the named species found among our MAGs, Treponema succinifaciens, has been reported from the equine gut by 16S studies (Daly et al., 2001), but ours represents the first report of a genome from this species in this setting.
The recently named species Acinetobacter lanii (Zhu et al., 2021) has been isolated from the Tibetan wild ass Equus kiang, but our MAG represents the first report of an association between this species and the domesticated horse. Although the genus Phascolarctobacterium is known to inhabit the horse gut (Metcalf et al., 2017;, here we provide the first evidence of a specific link between the horse and the species P. succinatutens, previously found in human and pig faeces (Watanabe, Nagai & Morotomi, 2012). Our MAG catalogue provides the first report in the horse of the species Pseudomonas lundensis, first isolated from meat, but now recognised as an emerging pathogen of humans (Molin, Ternström & Ursing, 1986;Scales et al., 2018).
Among our MAG species clusters, ninety-six represent new candidate species within sixty-one bacterial genera previously delineated by GTDB. The majority of these novel species had <85% ANI to their closest known representative within GTDB databases (Fig. 1B). Sixty of these genera occur in the gut microbiota of at least one additional mammalian host species. Eleven of our species that could be assigned only to the level of family fell into ten clusters (delineated at 60% AAI) representing novel candidate genera from seven different families (Table S6). The archaeal genus Methanocorpusculum is Figure 1 Taxonomic classification of 110 MAG species clusters derived from five metagenomic equine faecal samples. (A) Depicted as a phylogenetic tree-where phylum, as assigned by GTDB, is indicated by colour range. All GTDB-tk assigned subdivisions of the Firmicutes phylum have been collapsed to a single 'Firmicutes' designation. The tree was based upon an alignment of 16 concatenated ribosomal proteins and constructed using FastTree. The final tree was visualised and manually annotated using the online iTOLv5.7 tool. Phylum-level taxonomy is described by branch colour according to GTDB designation (Phyla with an alphabetical suffix have been collapsed). The presence (filled) or absence (hollow) of thought to play a role in methane production in the equine gut (Murru et al., 2018). Here, we have delineated a novel species from this ecosystem: Candidatus Methanocorpusculum equi.
Building on our recent efforts with the chicken gut microbiome (Gilroy et al., 2021) and with the automated creation of well-formed Latin names, we have created Candidatus names (abbreviated as Ca.) for all the unnamed taxa revealed by our metagenomic analyses (Table 1). We also created Latin names for species and genera recognised by GTDB, but previously assigned only alphanumeric designations. For taxa found only in the horse, we created names that incorporated Greek or Latin roots for this host (e.g., Ca. Equimonas). However, if searches of the GTDB and NCBI databases suggested that genera had representatives in other gut microbiomes, we opted for names that specified gut or faeces as habitat (e.g., Ca. Limimonas).

A novel class within the Armatimonadetes
One of our MAGs-and the associated species cluster, which we have called Ca. Hippobium faecium-was assigned to the family of alphanumeric designation UBA5829. Assigned by GTDB to its own class, order and family, all members of this family belong to the recently named phylum Armatimonadetes (also called Armatimonadota; previously known as OP10) (Tamaki et al., 2011). Scrutiny of the NCBI database in August 2021 reveals that no genome assemblies linked to this phylum originate from the vertebrate gut, instead being metagenome-assembled genomes largely derived from bioreactors. Ca. Hippobium faecium was found at >1× coverage in two samples (SAMN13344080 & SAMN13344082), with relative abundance of this species across both samples being 94% and 4% respectively.

Distribution and metabolism
Our de-replicated high-and medium-quality MAGs account for 18% (±5%) of our host-depleted metagenomic reads. Distribution analysis identified 17 species present at ≥1× coverage in all samples (core MAGs represent 15% of our dereplicated MAG catalogue), spanning four bacterial phyla and the archaea ( Fig. 2A and Table S7). No species were present at ≥10× coverage in all samples. While the majority of identified MAG species clusters had predominant relative abundance in only one sample, species including Ca. Methanocorpusculum equi, Acinetobacter lanii, Ca. Colimonas fimequi and Ca. Colisoma equi had more uniform distribution across all sample indicating a more central function in equine health. Species quantification shows a steady incline in the cumulative number of species identified when successively adding each of the five separate Figure 1 (continued) genes associated with catalysing carbohydrate degradation (blue) or aiding in the metabolism of short chain fatty acids (red) are reported in the associated binary plot. (B) Average Nucleotide Identity (ANI) between recovered MAGs and their closest representative within the GTDB database (release 202). Only MAGs placed within a previously recognised genus, and whereby this taxonomic assignment was inclusive of an ANI measurement, are shown. Individual plots are coloured according to GTDB designated phylum, with phyla assigned an alphabetical suffix being collapsed. A dotted line is placed at 95% ANI, representing the utilised species-level boundary.
Protologues for new Candidatus taxa identified by analysis of metagenome-assembled genomes from equine faeces.
Description of Candidatus Alistipes equi sp. nov.
Candidatus Alistipes equi (e'qui. L. gen. masc. n. equi, of a horse) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E3_MB2_80 and which is available via NCBI BioSample SAMN18472495. The GC content of the type genome is 40.8% and the genome length is 2.08 Mbp.
Description of Candidatus Apopatocola gen. nov.
Candidatus Apopatocola (A.po.pa.to'cola. Gr. masc. n. apopatos, dung; N.L. masc./fem. suffix -cola, an inhabitant; N.L. fem. n. Apopatocola a microbe associated with faeces) A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Apopatocola equi. This is a new name for the GTDB alphanumeric genus UBA738, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Oscillospirales and to the family Oscillospiraceae.
Description of Candidatus Apopatocola equi sp. nov.
Candidatus Apopatocola equi (e'qui. L. gen. masc. n. equi, of a horse) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E1_MB2_75 and which is available via NCBI BioSample SAMN18472466. The GC content of the type genome is 59.6% and the genome length is 1.56 Mbp.
Description of Candidatus Apopatosoma gen. nov.
Candidatus Apopatosoma (A.po.pa.to.so'ma. Gr. masc. n. apopatos, dung; Gr. neut. n. soma, a body; N.L. neut. n. Apopatosoma, a microbe associated with faeces) A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Apopatosoma equi. This is a new name for the GTDB alphanumeric genus CAG-724, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Oscillospirales and to the family CAG-272.
Candidatus Apopatosoma intestinale (in.tes.ti.na'le. N.L. neut. adj. intestinale, pertaining to the intestines) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E5_133 and which is available via NCBI BioSample SAMN18472535. This is a new name for the alphanumeric GTDB species sp003524145, which is found in diverse mammalian guts. The GC content of the type genome is 53.8% and the genome length is 1.55 Mbp.
Description of Candidatus Apopatousia gen. nov.
Candidatus Apopatousia (A.po.pat.ou's.ia. Gr. masc. n. apopatos, dung; Gr. fem. n. ousia, an essence; N.L. fem. n. Apopatousia, a microbe associated with faeces) A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Apopatousia equi. This is a new name for the GTDB alphanumeric genus UBA9845, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Christensenellales and to the family UBA1242. A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E4_MB2_89 and which is available via NCBI BioSample SAMN18472531. GTDB has assigned this species to a genus marked with an alphabetical suffix. However, as this genus designation cannot be incorporated into a well-formed binomial, in naming. this species, we have used the current validly published name for the genus. The GC content of the type genome is 48% and the genome length is 2.14 Mbp. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Cacconaster caballi. This is a new name for the GTDB alphanumeric genus Bact-11, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family UBA932. A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E5_MB2_33 and which is available via NCBI BioSample SAMN18472547. The GC content of the type genome is 49% and the genome length is 1.90 Mbp.

Description of
Description of Candidatus Cacconaster scatequi sp. nov.
Candidatus Cacconaster scatequi (scat.e'qui. Gr. neut. n. skor, skatos, dung; L. masc. n. equus, a horse; N.L. gen. n. scatequi, associated with the faeces of horses) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E3_MB2_97 and which is available via NCBI BioSample SAMN18472499. The GC content of the type genome is 50.6% and the genome length is 1.90 Mbp.
Candidatus Cacconaster stercorequi (ster.cor.e'qui. L. masc. n. stercus, stercoris, dung; L. masc. n. equus, a horse; N.L. gen. n. stercorequi, associated with the faeces of horses) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E4_MB2_17 and which is available via NCBI BioSample SAMN18472518. The GC content of the type genome is 54.5% and the genome length is 1.83 Mbp.
Candidatus Chryseobacterium enterohippi (en.te.ro.hip'pi. Gr. neut. n. enteron, gut, bowel, intestine; Gr. masc./fem. n. hippos, a horse; N.L. gen. n. enterohippi, associated with the horse gut) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E1_189 and which is available via NCBI BioSample SAMN18472455. The GC content of the type genome is 34.3% and the genome length is 2.05 Mbp.
Description of Candidatus Colenecus gen. nov.
Candidatus Colenecus (Col.en.e'cus. L. neut. n. colon, large intestine; N.L. masc. n. enecus, an inhabitant; N.L. masc. n. Colenecus, a microbe associated with the large intestine) A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Colenecus caballi. This is a new name for the GTDB alphanumeric genus UBA1179, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family Bacteroidaceae.
Candidatus Colenecus caballi (ca.bal'li. L. gen. masc. n. caballi, of a horse) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E3_160 and which is available via NCBI BioSample SAMN18472483. The GC content of the type genome is 49.7% and the genome length is 2.25 Mbp.
Description of Candidatus Colicola gen. nov.
Candidatus Colicola (Co.li.co'la. L. neut. n. colon, large intestine; N.L. masc./fem. suffix -cola, an inhabitant; N.L. fem. n. Colicola, a microbe associated with the large intestine) A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Colicola caballi. This is a new name for the GTDB alphanumeric genus RF16, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family Paludibacteraceae. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Colimonas fimequi. This is a new name for the GTDB alphanumeric genus UBA1191, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Peptostreptococcales and to the family Anaerovoracaceae. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Colimorpha merdihippi. This is a new name for the GTDB alphanumeric genus UBA1711, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family P3.  A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Colinaster scatohippi. This is a new name for the GTDB alphanumeric genus UBA1712, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Lachnospirales and to the family Lachnospiraceae. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Coliplasma caballi. This is a new name for the GTDB alphanumeric genus UBA1752, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Oscillospirales and to the family CAG-382. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Colivivens caballi. This is a new name for the GTDB alphanumeric genus UBA1786, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family Bacteroidaceae. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Equicaccousia limihippi. This is a new name for the GTDB alphanumeric genus UMGS1279, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Oscillospirales and to the family Acutalibacteraceae. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Faecinaster equi. This is a new name for the GTDB alphanumeric genus UBA6382, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family Bacteroidaceae. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Fiminaster equi. This is a new name for the GTDB alphanumeric genus UBA3207, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order RFN20 and to the family CAG-826.

Description of Candidatus
Description of Candidatus Fiminaster equi sp. nov.
Candidatus Fiminaster equi (e'qui. L. gen. masc. n. equi, of a horse) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E4_MB2_69 and which is available via NCBI BioSample SAMN18472528. The GC content of the type genome is 34.5% and the genome length is 0.89 Mbp.  A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Hennigella equi. This is a new name for the GTDB alphanumeric genus RUG11194 which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Mycoplasmatales and to the family Mycoplasmoidaceae Description of Candidatus Hennigella equi sp. nov.
Candidatus Hennigella equi sp. nov. (e'qui. L. gen. masc. n. equi, of a horse) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E4_MB2_29 and which is available via NCBI BioSample SAMN18472521. The GC content of the type genome is 31.4% and the genome length is 0.64 Mbp.
Description of Candidatus Hennigimonas gen. nov.
Candidatus Hennigimonas gen. nov. (N.L. masc. n. hennigi derived from the Latinised family name of Willi Hennig; L. fem. n. monas, unit, monad; a microbe named in honour of Willi Hennig, founder of phylogenetic systematics) A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Hennigimonas equi. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family UBA932 Description of Candidatus Hennigimonas equi sp. nov.
Candidatus Hennigimonas equi sp. nov. (e'qui. L. gen. masc. n. equi, of a horse) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E3_MB2_147 and which is available via NCBI BioSample SAMN18472491. The GC content of the type genome is 52% and the genome length is 1.47 Mbp.
Description of Candidatus Hippenecus gen. nov.
Candidatus Hippenecus (Hipp.en.e'cus. Gr. masc./fem. n. hippos, a horse; N.L. masc. n. enecus, an inhabitant; N.L. masc. n. Hippenecus a microbe associated with horses) A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Hippenecus merdae. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Lachnospirales and to the family Lachnospiraceae.
Candidatus Hippenecus merdae (mer'dae. L. gen. fem. n. merdae, of faeces) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E3_87 and which is available via NCBI BioSample SAMN18472489. The GC content of the type genome is 52.7% and the genome length is 1.11 Mbp.
Description of Candidatus Hippobium gen. nov.
Candidatus Hippobium (Hip.po'bi.um. Gr. masc./fem. n. hippos, a horse; Gr. masc. n. bios, life; N.L. neut. n. Hippobium, a microbe associated with horses) A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Hippobium faecium. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order UBA5829 and to the family UBA5829.
Candidatus Hippobium faecium (fae'ci.um. L. fem. n. faex, dregs; L. gen. pl. n. faecium, of the dregs, of faeces) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E3_206 and which is available via NCBI BioSample SAMN18472485. The GC content of the type genome is 39.1% and the genome length is 2.12 Mbp. A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E1_MB2_88 and which is available via NCBI BioSample SAMN18472468. The GC content of the type genome is 35.7% and the genome length is 3.58 Mbp.
Description of Candidatus Limimonas gen. nov.
Candidatus Limimonas (Li.mi.mo'nas. L. masc. n. limus, dung; L. fem. n. monas, a monad; N.L. fem. n. Limimonas, a microbe associated with faeces) A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Limimonas coprohippi. This is a new name for the GTDB alphanumeric genus UBA1227, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Oscillospirales and to the family Acutalibacteraceae.
Candidatus Limimonas coprohippi (co.pro.hip'pi. Gr. fem. n. kopros, dung; Gr. masc./fem. n. hippos, a horse; N.L. gen. n. coprohippi, associated with the faeces of horses) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E1_MB2_82 and which is available via NCBI BioSample SAMN18472467. The GC content of the type genome is 40.5% and the genome length is 1.33 Mbp.
Candidatus Limimonas egerieequi (e.ge.ri.e.e'qui. L. fem. n. egeries, dung; L. masc. n. equus, a horse; N.L. gen. n. egerieequi, associated with the faeces of horses) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E5_MB2_129 and which is available via NCBI BioSample SAMN18472544. The GC content of the type genome is 41.8% and the genome length is 1.70 Mbp.
Candidatus Limimorpha caballi (ca.bal'li. L. gen. masc. n. caballi, of a horse) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E5_119 and which is available via NCBI BioSample SAMN18472534. The GC content of the type genome is 48.3% and the genome length is 2.76 Mbp.
Description of Candidatus Limimorpha equi sp. nov.
Candidatus Limimorpha equi (e'qui. L. gen. masc. n. equi, of a horse) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E1_MB2_99 and which is available via NCBI BioSample SAMN18472470. The GC content of the type genome is 45.1% and the genome length is 2.72 Mbp.
Description of Candidatus Liminaster gen. nov.
Candidatus Liminaster (Li.mi.nas'ter. L. masc. n. limus, dung; Gr. masc. n. naster, an inhabitant; N.L. masc. n. Liminaster, a microbe associated with faeces) A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Liminaster caballi. This is a new name for the GTDB alphanumeric genus UBA3663, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family UBA3663. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Merdinaster equi. This is a new name for the GTDB alphanumeric genus UBA7050, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Lachnospirales and to the family Lachnospiraceae, A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Minthenecus merdequi. This is a new name for the GTDB alphanumeric genus SFVR01, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family Paludibacteraceae.

Description of
Description of Candidatus Minthenecus merdequi sp. nov.
Candidatus Minthenecus merdequi (merd.e'qui. L. fem. n. merda, faeces; L. masc. n. equus, a horse; N.L. gen. n. merdequi, associated with the faeces of horses) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E5_MB2_18 and which is available via NCBI BioSample SAMN18472545. The GC content of the type genome is 42.5% and the genome length is 1.80 Mbp.
Description of Candidatus Minthocola gen. nov.
Candidatus Minthocola (Min.tho'co.la. Gr. masc. n. minthos, dung; N.L. masc./fem. suffix -cola, an inhabitant; N.L. fem. n. Minthocola, a microbe associated with faeces) A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Minthocola equi. This is a new name for the GTDB alphanumeric genus UBA3774, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Lachnospirales and to the family Lachnospiraceae. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Minthomonas equi. This is a new name for the GTDB alphanumeric genus CAG-831, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family UBA932. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Minthousia equi. This is a new name for the GTDB alphanumeric genus UBA4293, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family Bacteroidaceae. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Physcosoma equi. This is a new name for the GTDB alphanumeric genus UBA5920, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Sphaerochaetales and to the family Sphaerochaetaceae. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Scatonaster coprocaballi. This is a new name for the GTDB alphanumeric genus Firm-16, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Saccharofermentanales and to the family Saccharofermentanaceae. A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Scybalocola fimicaballi. This is a new name for the GTDB alphanumeric genus UBA1723, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family Paludibacteraceae.

Description of
Description of Candidatus Scybalocola fimicaballi sp. nov.
Candidatus Scybalocola fimicaballi (fi.mi.ca.bal'li. L. masc. n. fimus, dung; L. masc. n. caballus, a horse; N.L. gen. n. fimicaballi, associated with the faeces of horses) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E1_25 and which is available via NCBI BioSample SAMN18472456. This is a new name for the alphanumeric GTDB species sp002317115, which is found in diverse mammalian guts. The GC content of the type genome is 41.7% and the genome length is 3.11 Mbp.
Description of Candidatus Scybalousia gen. nov.
Candidatus Scybalousia (Scy.bal.ou's.ia. Gr. neut. n. skybalon, dung; Gr. fem. n. ousia, an essence; N.L. fem n. Scybalousia, a microbe associated with faeces) A bacterial genus identified by metagenomic analyses. The genus includes all bacteria with genomes that show ≥60% average amino acid identity (AAI) to the genome of the type strain from the type species Candidatus Scybalousia scubalohippi. This is a new name for the GTDB alphanumeric genus Phil12, which is found in diverse mammalian guts. This genus has been assigned by GTDB-Tk v1.3.0 working on GTDB Release 06-RS202 (Olm et al., 2017;Scales et al., 2018) to the order Bacteroidales and to the family P3.
Candidatus Scybalousia scybalohippi (scy.ba.lo.hip'pi. Gr. neut. n. skybalon, dung; Gr. masc./fem. n. hippos, a horse; N.L. gen. n. scybalohippi, associated with the faeces of horses) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E3_144 and which is available via NCBI BioSample SAMN18472482. The GC content of the type genome is 35.4% and the genome length is 2.63 Mbp.
Candidatus Sodaliphilus aphodohippi (aph.o.do.hip'pi. Gr. fem. n. aphodos, dung; Gr. masc./fem. n. hippos, a horse; N.L. gen. n. aphodohippi, associated with the faeces of horses) A bacterial species identified by metagenomic analyses. This species includes all bacteria with genomes that show ≥95% average nucleotide identity (ANI) to the type genome for the species to which we have assigned the MAG ID E3_0 and which is available via NCBI BioSample SAMN18472480. The GC content of the type genome is 50% and the genome length is 2.49 Mbp. bacteriophage catalogue included the model Escherichia coli phages T4 (Tevenvirinae) and T7 (Studiervirinae) or the Mycobacterium infecting actinophages. We observed several novel viral clusters comprising only genomes assembled in this study, which could be classified as the first representatives of new horse hindgut-associated phage families. Based on the proteome comparisons (Fig. 3B), we predict at least three new families. Over three quarters of the recovered phage genomes were found at >1 × coverage in just a single sample (Fig. 3C), with observed phage genomes ranging in coverage from 27 to 65. Similar inter-individual variation in phage abundance and diversity has been described within the human gut microbiome despite evidence of strong temporal stability within individuals (Ogilvie & Jones, 2015). Variation in phage composition between samples is High-quality or Complete phage genomes assembled from five equine faecal metagenomes and compared against a de-replicated RefSeq database of reference prokaryotic virus genomes. Each node represents a viral genome, with node colour depicting source sample and node size scales according to metagenome contig length. Grey nodes depict reference genomes, with no size scaling shown. Network edges indicate statistically significant relationships between the protein profiles of respective viral genomes. Annotation has been provided to highlight viral clusters of interest. (C) Upset plot of phage genomes shared between or specific to source faecal sample, set colour is defined by sample. Each bar represents the number of phage genomes described within the given samples. probably driven by environmental and host-derived factors, although the balance of these influences is yet to be defined (Duerkop, 2018). Only one phage was found in all five samples, with coverage ranging from 1.9×-29× and forming a cluster with Lactococcus phage P087 of the family Siphoviridae. The small sample size makes it impractical at this stage to define a core virome for the horse using criteria applied to the human gut microbiome (Manrique et al., 2016).

DISCUSSION
Compared to the human gut, the microbiology of the horse gut remains largely unexplored. Here, we deliver new insights into this important ecosystem while also showcasing the advantages of shotgun metagenomics in providing catalogues of genes and genome sequences that take us well beyond what can be achieved using 16S ribosomal RNA gene sequences. Exploration of just five faecal samples allowed discovery of-and recovery of-genomes from nearly 100 new bacterial and archaeal species and nearly 200 bacteriophage genomes, substantially increasing the known microbial diversity of this environment. Deposition of genomes from these species into publicly available databases will underpin all future studies, improving the quality of reference-based taxonomic assignments.
While the limited scope of this study means it cannot hope to provide a comprehensive view of taxonomic diversity within the horse gut, it gives us a tantalizing glimpse of the richness that awaits us when such approaches are rolled out more widely, particularly as integration of long-read sequencing into metagenomics brings the promise of genome assemblies rivaling those from cultured isolates (Moss, Maghini & Bhatt, 2020;Nicholls et al., 2019). These advances will help to bridge the gap between the taxonomic profiles already defined through amplicon sequencing and newly uncovered MAGs by allowing incorporation of complex repetitive elements into assemblies, which are often missed by current assembly algorithms. Just as the horse allowed humans to explore new external landscapes, new sequencing and bioinformatics approaches will allow us to explore the inner world of the equine gut microbiome.

CONCLUSIONS
This research generates an introductory census of the thoroughbred horse gut microbiome and its associated metabolic potential far beyond the scope of that seen in currently available metagenomic studies, with these often relying upon 16S rRNA gene sequence analyses. Here, we present dozens of novel bacterial genera and species. Assignment of previously unnamed species to Candidatus binomials, as employed here, provides an important precedent for the continued description of these organism as they are uncovered in other biological environments.
Dave Baker performed the experiments, authored or reviewed drafts of the paper, and approved the final draft. Roberto M La Ragione conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft. Christopher Proudman conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft. Mark J Pallen conceived and designed the experiments, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Animal Ethics
The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers): Completed under The University of Surrey's ethical review framework, project code: NERA-2017-007-SVM

Data Availability
The following information was supplied regarding data availability: The data are available at BioProject: PRJNA590977

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.13084#supplemental-information.